GPGPU parallel algorithms for structured-grid CFD codes
نویسندگان
چکیده
A new high-performance general-purpose graphics processing unit (GPGPU) computational fluid dynamics (CFD) library is introduced for use with structured-grid CFD algorithms. A novel set of parallel tridiagonal matrix solvers, implemented in CUDA, is included for use with structured-grid CFD algorithms. The solver library supports both scalar and block-tridiagonal matrices suitable for approximate factorization (AF) schemes. The computational routines are designed for both GPU-based CFD codes or as a GPU accelerator for CPU-based algorithms. Additionally, the library includes, among others, a collection of finite-volume calculation routines for computing local and global stable timesteps, inviscid surface fluxes, and face/node/cell-centered interpolation on generalized 3D, multi-block structured grids. GPU block tridiagonal benchmarks showed a speed-up of 3.6x compared to an OpenMP CPU Thomas Algorithm results when host-device data transfers are removed. Detailed analysis shows that a structure-of-arrays (SOA) matrix storage format versus an array-of-structures (AOS) format on the GPU improved the parallel blocktridiagonal performance by a factor of 2.6x for the parallel cyclic reduction (PCR) algorithm. The GPU block tridiagonal solver was also applied to the OVERFLOW-2 CFD code. Performance measurements using synchronous and asynchronous data transfers within the OVERFLOW-2 code showed poorer performance compared to the cache-optimized CPU Thomas Algorithm. The poor performance was attributed to the significant cost of the rank-5 sub-matrix and sub-vector host-device data transfers and the matrix format conversion. The finite-volume maximum time-step and inviscid flux kernels were benchmarked within the MBFLO3 CFD code and showed speed-ups, including the cost of host-device memory transfers, ranging from 3.2–4.3x compared to optimized CPU code. It was determined, however, that GPU acceleration could be increased to 21x over a single CPU core if host-device data transfers could be eliminated or significantly reduced.
منابع مشابه
Parallel unstructured mesh CFD codes: a role for recursive clustering techniques in mesh decomposition
In principle, unstructured mesh CFD codes can be parallelised using a mesh decomposition approach similar to structured mesh codes. However, for unstructured codes the mesh structure is problem dependent and algorithms for automatically decomposing the mesh onto the processors are required. An algorithm based upon a recursive clustering technique, for decomposing meshes into an arbitrary number...
متن کاملA case study of the partitioning patterns for domain decomposition method on VPP700E
The most common parallelization strategy for many Computational Mechanics (typified by Computational Fluid Dynamics(CFD) applications) which use structured grids, involves the one directional partition based upon slads of grids. For parallelised versions of CFD codes to scale well we must employ two (or more) dimensional partitions. However, FORTRAN code implementations by multi-directional par...
متن کاملGenerating Binary Optimal Codes Using Heterogeneous Parallel Computing
Generation of optimal codes is a well known problem in coding theory. Many computational approaches exist in the literature for finding record breaking codes. However generating codes with long lengths n using serial algorithms is computationally very expensive, for example the worst case time complexity of a Greedy algorithm is O(n 4). In order to improve the efficiency of generating codes wit...
متن کاملAdvanced Optimizations of An Implicit Navier-Stokes Solver on GPGPU
General-purpose computing on graphics processing units (GPGPU) is a massive fine-grain parallel computation platform, which is is particularly attractive for CFD tasks due to its potential of one or two magnitudes of performance improvement with relatively low capital investment. Many successful attempts have been reported in recent years (see, for example [1, 2, 3, 4, 5, 6]). Although early at...
متن کاملStrategies for Parallel and Numerical Scalability of CFD Codes
In this article we discuss a strategy for speeding up the solution of the NavierStokes equations on highly complex solution domains such as complete aircraft, spacecraft, or turbomachinery equipment. We have used a nite-volume code for the (non-turbulent) Navier-Stokes equations as a testbed for implementation of linked numerical and parallel processing techniques. Speedup is achieved by the Ta...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011